Biclustering: Methods, Software and Application
نویسندگان
چکیده
Over the past 10 years, biclustering has become popular not only in the field of biological data analysis but also in other applications with high-dimensional two way datasets. This technique clusters both rows and columns simultaneously, as opposed to clustering only rows or only columns. Biclustering retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. This dissertation focuses on improving and advancing biclustering methods. Since most existing methods are extremely sensitive to variations in parameters and data, we developed an ensemble method to overcome these limitations. It is possible to retrieve more stable and reliable bicluster in two ways: either by running algorithms with different parameter settings or by running them on subor bootstrap samples of the data and combining the results. To this end, we designed a software package containing a collection of bicluster algorithms for different clustering tasks and data scales, developed several new ways of visualizing bicluster solutions, and adapted traditional cluster validation indices (e.g. Jaccard index) for validating the bicluster framework. Finally, we applied biclustering to marketing data. Well-established algorithms were adjusted to slightly different data situations, and a new method specially adapted to ordinal data was developed. In order to test this method on artificial data, we generated correlated original random values. This dissertation introduces two methods for generating such values given a probability vector and a correlation structure. All the methods outlined in this dissertation are freely available in the R packages biclust and orddata. Numerous examples in this work illustrate how to use the methods and software.
منابع مشابه
A Workflow for the Application of Biclustering to Mass Spectrometry Data
Biclustering techniques have been successfully applied to analyze microarray data and they begin to be applied to the analysis of mass spectrometry data, a high-throughput technology for proteomic data analysis which has been an active research area during the last years. In this work, we propose a novel workflow to the application of biclustering to MALDI-TOF mass spectrometry data, supported ...
متن کاملBi-Force: large-scale bicluster editing and its application to gene expression data biclustering
The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data an...
متن کاملA Framework to Analyze Biclustering Results on Microarray Experiments
Microarray technology produces large amounts of information to be manipulated by analysis methods, such as biclustering algorithms, to extract new knowledge. All-purpose multivariate data visualization tools are usually not enough for studying microarray experiments. Additionally, clustering tools do not provide means of simultaneous visualization of all the biclusters obtained. We present an i...
متن کاملQualitative Biclustering with Bioconductor Package rqubic
Biclustering has been suggested and found very useful to discover gene regulation patterns from gene expression microarrays. Several quantitative algorithms, among others CC and BIMAX, have been implemented in R, mainly by the biclust package. To our best knowledge, there have been so far no qualitative biclustering methods implemented. Therefore we introduce rqubic, a Bioconductor package impl...
متن کاملBiclustering of gene expression data
Biclustering is an important problem that arises in diverse applications, including the analysis of gene expression and drug interaction data. A large number of clustering approaches have been proposed for gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the exi...
متن کامل